Skip to content

[Feat] Enable Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image#2339

Merged
wtomin merged 9 commits intovllm-project:mainfrom
yuanheng-zhao:feat/add-imagegen-layerwise
Apr 8, 2026
Merged

[Feat] Enable Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image#2339
wtomin merged 9 commits intovllm-project:mainfrom
yuanheng-zhao:feat/add-imagegen-layerwise

Conversation

@yuanheng-zhao
Copy link
Copy Markdown
Contributor

@yuanheng-zhao yuanheng-zhao commented Mar 30, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR aims at supporting and validating layerwise CPU offloading for more diffusion models (or one of components of omni models).

Most of the work is about testing and verifying these models work with the feature. If there exist out-of-scope issues or special handling for specific models, we might resolve in another PR later.

Planing models and supported in this PR:

  • SD3.5, stabilityai/stable-diffusion-3.5-medium
  • Ovis-Image, AIDC-AI/Ovis-Image-7B
  • Nextstep_1, stepfun-ai/NextStep-1.1
  • LongCat-Image, meituan-longcat/LongCat-Image

Planning but not enabled in this PR:

  • GLM-Image - version mismatch
  • MammothModa2 - cannot run successfully on my side, ValueError: Tokenizer class MammothUTokenizer does not exist or is not currently imported.

Test Plan

Offline generations, refer to subsequent comments for detailed testing commands
#2339 (comment)

Test Result

Stats

*Tested on H100, single device
*Peak memory recording from DiffusionModelRunner._record_peak_memory

model \ feature Peak Memory Peak Memory Layerwise Total gen time (seconds) Total gen time Layerwise
stabilityai/stable-diffusion-3.5-medium 20.15 GB reserved, 18.00 GB allocated, 2.15 GB pool overhead (10.7%) 16.44 GB reserved, 14.05 GB allocated, 2.39 GB pool overhead (14.5%) 1.9754 4.8757
stepfun-ai/NextStep-1.1 29.58 GB reserved, 29.03 GB allocated, 0.55 GB pool overhead (1.9%) 5.96 GB reserved, 4.90 GB allocated, 1.06 GB pool overhead (17.8%) 73.7517 543.4749
AIDC-AI/Ovis-Image-7B 21.70 GB reserved, 19.51 GB allocated, 2.20 GB pool overhead (10.1%) 9.56 GB reserved, 6.62 GB allocated, 2.94 GB pool overhead (30.8%) 9.4076 26.6770
meituan-longcat/LongCat-Image 31.93 GB reserved, 29.67 GB allocated, 2.26 GB pool overhead (7.1%) 21.74 GB reserved, 18.74 GB allocated, 2.99 GB pool overhead (13.8%) 9.1991 24.6322

*Strongly not recommended to enable layerwise offloading on stepfun-ai/NextStep-1.1, as it's an AR with Diffusion heads model which runs multiple denoising steps for each of token generated (quite a lot of offloading happens)
*The total generation time increased for the above profiling when enabling the feature, I'm suspecting that for image gen, compute goes faster. This happened before on Qwen-Image image gen tasks: #858 (comment) ; We might want further profiling on specific devices.

Generated image comparison

model \ feature offloading disabled Layerwise offloading enabled
stabilityai/stable-diffusion-3.5-medium output_sd3 output_sd3_layerwise
AIDC-AI/Ovis-Image-7B output_ovis_image output_ovis_image_layerwise
stepfun-ai/NextStep-1.1 output_nextstep output_nextstep_layerwise
meituan-longcat/LongCat-Image output_longcat output_longcat_layerwise

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
@yuanheng-zhao yuanheng-zhao force-pushed the feat/add-imagegen-layerwise branch from 0e22230 to e184a17 Compare April 6, 2026 06:22
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
@yuanheng-zhao
Copy link
Copy Markdown
Contributor Author

yuanheng-zhao commented Apr 6, 2026

Text to image example

stabilityai/stable-diffusion-3.5-medium

python examples/offline_inference/text_to_image/text_to_image.py \
	--model stabilityai/stable-diffusion-3.5-medium \
	--prompt "A serene mountain landscape at sunset" \
	--negative-prompt "blurry, low quality, distorted" \
	--guidance-scale 4.5 \
	--num-inference-steps 28 \
	--height 1024 \
	--width 1024 \
	--seed 42 \
	--output output_sd3_layerwise.png \
	--enable-layerwise-offload

python examples/offline_inference/text_to_image/text_to_image.py \
	--model stabilityai/stable-diffusion-3.5-medium \
	--prompt "A serene mountain landscape at sunset" \
	--negative-prompt "blurry, low quality, distorted" \
	--guidance-scale 4.5 \
	--num-inference-steps 28 \
	--height 1024 \
	--width 1024 \
	--seed 42 \
	--output output_sd3.png

stepfun-ai/NextStep-1.1

python examples/offline_inference/text_to_image/text_to_image.py \
	  --model stepfun-ai/NextStep-1.1 \
	  --prompt "A baby panda wearing an Iron Man mask, holding a board with 'NextStep-1' written on it" \
	  --height 512 \
	  --width 512 \
	  --num-inference-steps 28 \
	  --guidance-scale 7.5 \
	  --guidance-scale-2 1.0 \
	  --cfg-schedule constant \
	  --seed 42 \
	  --output output_nextstep_layerwise.png \
	  --enable-layerwise-offload \
	  --init-timeout 1200 \
	  --stage-init-timeout 1200

python examples/offline_inference/text_to_image/text_to_image.py \
	  --model stepfun-ai/NextStep-1.1 \
	  --prompt "A baby panda wearing an Iron Man mask, holding a board with 'NextStep-1' written on it" \
	  --height 512 \
	  --width 512 \
	  --num-inference-steps 28 \
	  --guidance-scale 7.5 \
	  --guidance-scale-2 1.0 \
	  --cfg-schedule constant \
	  --seed 42 \
	  --output output_nextstep.png

AIDC-AI/Ovis-Image-7B

python examples/offline_inference/text_to_image/text_to_image.py \
	  --model AIDC-AI/Ovis-Image-7B \
	  --prompt "A creative 3D artistic render where the text \"OVIS-IMAGE\" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The paint is a mix of vibrant rainbow colors (red, blue, yellow) swirling together like toothpaste or impasto art. You can see the ridges of the brush bristles and the glossy, wet texture of the paint. The background is a clean artist's canvas. Dynamic lighting creates soft shadows behind the floating paint strokes. Colorful, expressive, tactile texture, 4k detail." \
	  --height 1024 \
	  --width 1024 \
	  --num-inference-steps 50 \
	  --guidance-scale 5.0 \
	  --cfg-schedule constant \
	  --seed 42 \
	  --output output_ovis_image_layerwise.png \
	  --enable-layerwise-offload

python examples/offline_inference/text_to_image/text_to_image.py \
	  --model AIDC-AI/Ovis-Image-7B \
	  --prompt "A creative 3D artistic render where the text \"OVIS-IMAGE\" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The paint is a mix of vibrant rainbow colors (red, blue, yellow) swirling together like toothpaste or impasto art. You can see the ridges of the brush bristles and the glossy, wet texture of the paint. The background is a clean artist's canvas. Dynamic lighting creates soft shadows behind the floating paint strokes. Colorful, expressive, tactile texture, 4k detail." \
	  --height 1024 \
	  --width 1024 \
	  --num-inference-steps 50 \
	  --guidance-scale 5.0 \
	  --cfg-schedule constant \
	  --seed 42 \
	  --output output_ovis_image.png

meituan-longcat/LongCat-Image

python examples/offline_inference/text_to_image/text_to_image.py \
	  --model meituan-longcat/LongCat-Image \
	  --prompt "一个年轻的亚裔女性,身穿黄色针织衫,搭配白色项链。她的双手放在膝盖上,表情恬静。背景是一堵粗糙的砖墙,午后的阳光温暖地洒在她身上,营造出一种宁静而温馨的氛围。镜头采用中距离视角,突出她的神态和服饰的细节。光线柔和地打在她的脸上,强调她的五官和饰品的质感,增加画面的层次感与亲和力。整个画面构图简洁,砖墙的纹理与阳光的光影效果相得益彰,突显出人物的优雅与从容。" \
	  --height 768 \
	  --width 1344 \
	  --num-inference-steps 50 \
	  --guidance-scale 4.0 \
	  --seed 42 \
	  --output output_longcat.png

python examples/offline_inference/text_to_image/text_to_image.py \
	  --model meituan-longcat/LongCat-Image \
	  --prompt "一个年轻的亚裔女性,身穿黄色针织衫,搭配白色项链。她的双手放在膝盖上,表情恬静。背景是一堵粗糙的砖墙,午后的阳光温暖地洒在她身上,营造出一种宁静而温馨的氛围。镜头采用中距离视角,突出她的神态和服饰的细节。光线柔和地打在她的脸上,强调她的五官和饰品的质感,增加画面的层次感与亲和力。整个画面构图简洁,砖墙的纹理与阳光的光影效果相得益彰,突显出人物的优雅与从容。" \
	  --height 768 \
	  --width 1344 \
	  --num-inference-steps 50 \
	  --guidance-scale 4.0 \
	  --seed 42 \
	  --output output_longcat_layerwise.png \
	  --enable-layerwise-offload

@yuanheng-zhao yuanheng-zhao changed the title [WIP][Feat] Support Layerwise CPU offloading for more image-gen models [WIP][Feat] Support Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image Apr 6, 2026
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
@yuanheng-zhao yuanheng-zhao changed the title [WIP][Feat] Support Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image [Feat] Support Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image Apr 6, 2026
@yuanheng-zhao yuanheng-zhao marked this pull request as ready for review April 6, 2026 13:09
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@yuanheng-zhao yuanheng-zhao changed the title [Feat] Support Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image [Feat] Enable Layerwise CPU offloading for SD3.5, Ovis-Image, Nextstep_1, LongCat-Image Apr 6, 2026
@yuanheng-zhao
Copy link
Copy Markdown
Contributor Author

yuanheng-zhao commented Apr 7, 2026

PTAL @wtomin
cc @ZJY0516 @gcanlin

Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please fix conflicts :)

@gcanlin gcanlin added ready label to trigger buildkite CI and removed ready label to trigger buildkite CI labels Apr 7, 2026
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Apr 7, 2026
@wtomin wtomin merged commit 8a55d3d into vllm-project:main Apr 8, 2026
8 checks passed
@yuanheng-zhao yuanheng-zhao deleted the feat/add-imagegen-layerwise branch April 8, 2026 02:11
vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026
…p_1, LongCat-Image (vllm-project#2339)

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
bob-021206 pushed a commit to jasonlee-1024/vllm-omni that referenced this pull request Apr 21, 2026
…p_1, LongCat-Image (vllm-project#2339)

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Signed-off-by: bob-021206 <binyan_github@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants